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Abstract 

In this paper, we prove a new identity for the least-square solution of an over-determined set of linear equation 
Ax = b, where A is an m x n full-rank matrix, b is a column-vector of dimension m, and m (the number of equations) 
is larger than or equal to n (the dimension of the unknown vector x). Generally, the equations are inconsistent and 
there is no feasible solution for x unless b belongs to the column-span of A. In the least-square approach, a candidate 
solution is found as the unique x that minimizes the error function || Ax — &||2• 

We propose a more general approach that consist in considering all the consistent subset of the equations, finding 
their solutions, and taking a weighted average of them to build a candidate solution. In particular, we show that 
by weighting the solutions with the squared determinant of their coefficient matrix, the resulting candidate solution 
coincides with the least square solution. 


Index Terms 

Over-determined linear equation, Least square solution. 

I. Introduction 

A. Over-determined Set of Linear Equations 

Let A be an m x n full-rank matrix and let b G K'" be a column vector, and consider the linear equation Ax = b, 
to be solved for the unknown vector x £ K". Theory and practice of solving these equations play a major role in 
essentially every part of mathematics such as linear algebra, operational research, optimization, combinatorics, etc. 
When m > n, we call the equations over-determined and there is a solution if and only if b belongs to the column- 
span of A H|. Generally, the equations are inconsistent and we need some kind of criteria to build a candidate 
solution. 
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One approach for finding a solution is the least-square approach a where we find a solution by minimizing 
the quadratic form ||Ax — b\\ 2 - The resulting solution is given by x = A#b, where A# = (A*A) -1 A* denotes the 
pseudo-inverse of A. In estimation theory, x can be interpreted as the best linear unbiased estimator (BLUE) of the 
signal x observed via a linear channel given by the matrix A and contaminated with an i.i.d. Gaussian noise ED. 
Note that in this case, if b is in the column-span of A, the resulting estimation error is zero. 

Another approach for building a candidate solution is by some kind of averaging all the possible sub-solutions. 
To explain this more precisely, we first need to introduce some notations. For k £ N, we define [fc] = {1, 2,... ,k\ 
to be the set of all integers from 1 up to k. We denote by V the set of all subsets of [ m\ of size n, i.e., V = {p C 
[m] : |p| = n}, where \p\ denotes the size of the subset p. For a p C V, we define A p to be the n x n matrix 
obtained by selecting the rows of the matrix A belonging to p by keeping their order as in A. 

Suppose p £ V is such that det(A p ) ^ 0. By restricting the equations to A p , we can obtain a sub-solution 
x p = A~ 1 b p , where b p is the a sub-vector of b consisting of the components with index in p whose order is the 
same as in b. Taking the weighted average of all possible sub-solutions with a weighting uj p > 0, p £ V, we can 
build a candidate solution as follows 


gUJ _ Epgp UpXp 

E P g-p 

As the matrix A is full-rank, there is at least one p £ V with a nonzero det(A p ), thus s“ is well-defined. By 
changing the associated weighting u> p , we obtain a variety of candidate solutions for the over-determined equation 
Ax = b. 

Let us consider the weighting function uj p = det(A p ) 2 , which is equal to the squared determinant of the sub¬ 
matrix A p , and let us define the resulting solution by 


Epg-pdet^p) 2 ^ 1 ^ 

Epgp det(Ap) 2 


( 2 ) 


If for a specific p £ V, det(A p ) = 0 then A p 1 does not exist but, with some abuse of notation, this term does not 
play a role because its corresponding weight det(A p ) 2 is equal to 0. 


B. Our Contribution 

We prove that with the weighting ui p = det(A p ) 2 , the resulting solution xls in Eq. © coincides with the least- 
square solution given by A&b = (A 4 A) -1 A t b. More importantly, this holds for every full-rank matrix A and for 
an arbitrary vector b. We have summarized this in the following theorem. 

Theorem 1. Suppose A is a given m x n full-rank matrix with m > n and assume that b £ R m is an arbitrary 
vector. Let X|_s be the weighted average solution given by Eq. ©. Then x\_s — {A*A) l A t b, i.e., X|_s coincides 
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with the least square solution. 

C. Notation and Auxiliary Results 

In this section, we first introduce the required notations for the rest of the paper and prove some auxiliary results 
that we need to prove Theorem Q] Let B be an arbitrary n x n matrix and let p C [m] of size \p\ = n. We 
denote by embb(B 1 p,m) the embedding of columns of B inside an n x to matrix. More precisely, assume that 
the components of p are sorted with pi < P 2 < • • • < p n . Then embb (B,p,m) is an n x to matrix whose pi-th 
column, i £ [n], is equal to the *-th column of B, and all the other rn n columns are set to zero. 

Let r, c £ N be arbitrary numbers. We define the linear space of all r x c real-valued matrices by Ms_{r, c) 
with the traditional matrix addition and scalar-matrix multiplication. For arbitrary matrices M,N £ M^(r,c), we 
define the following bilinear form (M,N) = tr (MN 1 ) = Yhi j A /ijNij . It is not difficult to see that (,} defines 
an inner product on M]j(r, c). We denote the trace and the determinant of a square matrix M by tr(M) and 
det(M) respectively. We need the following auxiliary results from linear algebra. We have included all the proofs 
in Appendix lAl 

Lemma 1. Let r, c £ N and let M £ M^(r, c). If (M , N) = 0 for every N £ c), then M = 0. 

Lemma 2. Let M be an square invertible matrix whose components depend on a parameter u. Then, ^-M -1 = 




Lemma 3. Let A be an square matrix whose components depend on a parameter u. Then, ^det(A) = det(A)tr(A 1 ^ A) 


Lemma 4. Let M and S be n x n matrices, where S is symmetric. Then tr (SM) = tr(S'M 4 ). 


Theorem 2 (Cauchy-Binet). Let A and B be m x n matrices with m > n. Then, 




(3) 


pd[m],\p\—n 


where \p\ denotes the number of elements of p C [to]. 


II. Proof of the Main Theorem 


In the section, we prove Theorem Q] Using Eq. ([2j. we can write x\_s in the following form: 



(4) 


det(Ap) 2 embb(A p 1 ,p, m)b 

det(A t A) 


(5) 
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where in the last term we used the definition of embb(A“ m). Recall that for p £ V, with elements pi < P2 < 
• • • < Pm we denote by embb (A - 1 ,p,m) an all-zero n x m matrix except for its p^ th column witch is equal to 
the i-th column of A~ 1 . Now, we need to prove that for any b £ R m and for any m x n full-rank matrix A, the 
following identity holds 


{A t A)~ 1 A t b 


EpeP det (A >) 2 embb(A p 1 ,p, to) 
det(A 4 A) 


( 6 ) 


As this should be true for every b £ M m , we need to prove the following matrix identity: 


det(A*A) (AA) 1 A t = J2 pe v det(A p ) 2 embb(A p 1 ,p,m). (7) 

As a first step, it is easy to check that both sides are n x m matrices, thus the dimensions are compatible. 

In order to prove the identity {7}, let us define the function / : Mr(to, n) —> R as follows: 

f(A) = det(A*A) - ^2 det(A p ) 2 . (8) 

p&v 

Using the Cauchy-Binet formula as stated in Theorem [2] we obtain 

det(A*A) = ^2 det(A p )det(A p ) = ^det(Ap) 2 , (9) 

p&V pGV 

which implies that /(A) = 0 for every A £ Mr(to, n). Let u = Aij be a parameter denoting the component of A 
at row i and column j. As /(A) = 0, we have Jjj/(A) = 0, which implies that 

4-det(AA) ( = } det(A*A)tr{ (A t A)~ l -^—(A t A)\ 
ou l ou J 

= det(AA)tr{(A A)- 1 ((JJ- A)*A + A JJ-A)} 

( = det(A t A)tr^(A t A)- 1 (A t -^-A + A t ^-A)J 
= 2det(A*A)tr{(A*A) _1 A f J^-a} 

= 2det(A t A)tr{(A‘A)- 1 A t Uy} 

= 2det(A t A)((A t A)~ 1 A t ,U0, 

where (a) follows from Lemma [3] applied to the matrix A* A, (b) follows from the chain rule applied to A* A, (c) 
follows from Lemma Q] applied to the symmetric matrix (A*A) -1 and the matrix (^■A) t A, (d) results by taking 
the component-wise derivative of A with respect to u = A l3 which we denote by U l3 , and where (e) results from 
the definition of the inner product for two matrices. We can simply check that Uij is an m x n matrix with all-zero 
components except for ij-th component which is equal to 1. 
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Now, taking the derivative of the other term in Eq. ([8} with respect to u = Aij, we obtain 


d_ 

du 


y, det(A p ) 2 = y 2 det(A p )-^-det(A p ) 

pGV p&V 


- y 2det(A p )det(A p )tr(A p 1 -^-A p ) 

pGV 


= y 2det(Ap) 2 tr(embb(A p 1 ,p,m)-^-A) 
p£P 

= 2tr| y det(y4p) 2 embb(A~ 1 ,p, m)Ujj | 
p&V 

2 (y det(A p ) 2 embb(Ap\p,TO),I70, 
p&V 


where (a) results from Lemma [3] applied to the matrix A p . We also have (b) from the definition of the embedding 
n columns of A ~ 1 in an m x n matrix. In particular, notice that as the remaining columns of embb(A“ 1 ,p, to) are 
all zero, we can replace A p by A. Finally, (c) results from the linearity of the trace operator tr, and (d) follows 
from the definition of the inner product. Therefore, we obtain that 


0 = = 2 ( u ’“’ (10) 

det (A 1 A)(A*A ) -1 A* — ^ det(Ap) 2 embb(Ap 1 ,p, m) 
p&v 

Notice that equality in Eq. ( flOb holds for all matrices Ud , i £ \m\,j £ [n]. As, f/f form an orthonormal basis for 
the linear space Mr (to, n), from Lemma [I] it immediately results that 


det (A t A)(A t A) 1 A t = J2 P er det(Ap) 2 embb(A p 1 ,p, to). 
From Eq. 0, this is exactly what we needed to prove. 


Appendix A 

Proof of the Auxifiary Results 
In this section, we provide the proofs of the auxiliary results. 

Proof of Lemma [T} Let i £ [r]. j £ [c] be arbitrary numbers and let N be an all-zero matrix except for the zj-th 
element which is set to 1. It results that 


o = (M,N) = y M u Nu = Mij = 0. 
k,e 

As this is true for arbitrary i and j, it results that M = 0. 
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Proof of Lemma |2j Let I be the identity matrix of the same order as M. Taking derivative from both sides of 
the identity I = MM _1 , and using the chain rule, we obtain that 

0 = --MM- 1 + M-^-M~\ 
ou ou 

which implies that ^M _1 = —M~ 1 (J^M)M~ 1 . 

Proof of Lemma [3} Assume that A is a d x d matrix and let us denote by A, :l the component of A in row i 
and column j. We first find jA—Aet(A) and use the chain rule to obtain 

Adet(A)= Y. a3-det(- 4 )A' 4 „, (ID 

i,je[d] 3 

Notice that in order to compute det(A), we can expand it with respect to the i-th row, where we obtain 

det(A) = ^2 (-l) l+fc det(A ifc ), (12) 

k£[d] 

where A^ is a (d — 1) x (d — 1) matrix obtained after removing the i-th row and the fc-th column of the matrix 
A. In particular, it can be immediately checked that the only term in the summation (fl2T> that depends on A t] is 
(—l) I+J det(Aij), thus we obtain 

-^-dot;.-l) = (-l) l+J det(iy) = adj(A) ji; (13) 

where adj(A) denotes the adjoint of the matrix A. Moreover, from the formula A -1 = for the inverse of 

the matrix A, we immediately obtain that 

-^-det(A) = det(A)(A _1 ) ii . (14) 

Using the the chain-rule as in Eq. CD, we have 

r\ A a 

—det(A) = det(A)^(A _1 ) ji —A„- = det(A)tr(A _1 —A), 

ij 

where tr denotes the trace operator and where -§^A denotes the component-wise partial derivative of A with respect 
to u. 

Proof of Lemma [4} The proof simply follows from the properties of the trace operator: 

tr (SM) = tr {(SM)*) = tr (M*5‘) = tr(M t 5') = tr(SAf*), 
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where we used the symmetry of S and the fact that for arbitrary square matrices K , L of the same dimension, 
tr (KL) = tr (LK). 
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